home *** CD-ROM | disk | FTP | other *** search
- %format plain
- % Knuth's article
-
- \font\logo=logo10 % font used for the METAFONT logo
- \font\logosl=logo10 % font used for slanted METAFONT logo
-
- \def\MF{{\logo META}\-{\logo FONT}}
- \def\MFbook{{\sl The {\logosl METAFONT}\kern1pt book}}
- \def\TeX{T\hbox{\hskip-.1667em\lower.424ex\hbox{E}\hskip-.125em X}}
- \def\ldt{\mathinner{\ldotp\ldotp}}
-
- \line{\bf The New Versions of \TeX\ and \MF\ \hfill by Donald E. Knuth}
- \bigskip
- \noindent
- For more than five years I held firm to my conviction that a stable system
- was far better than a system that continues to evolve. But during the TUG
- meeting at Stanford in August, 1989, I~was persuaded to make one last set of
- changes, in order to bring \TeX\ and \MF\ to a state of completion consistent
- with their overall philosophy and goals.
-
- The main reason for the changes was the fact that I~had guessed wrong about
- 7-bit character sets versus 8-bit character sets. I~believed that standard text
- input would continue indefinitely to be confined to at most 128~characters,
- since I~did not think a keyboard with 256~different outputs would be
- especially efficient. Needless to say, I~was proved wrong, especially by
- developments in Europe and Asia. As soon as I~realized that a text formatting
- program with 7-bit input would rapidly begin to seem as archaic as the 6-bit
- systems we once had, I~knew that a fundamental revision was necessary.
-
- But the 7-bit assumption pervaded everything, so I needed to take the programs
- apart and redo them thoroughly in 8-bit style. This put \TeX\
- onto the operating table and under the knife
- for the first time since 1984, and I~had a final
- opportunity to include a few new features that had occurred to me or been
- suggested by users since then.
-
- The new extensions are entirely upward compatible with previous versions
- of \TeX\ and \MF\ (with a few small exceptions mentioned below).
- This means that error-free inputs to the old \TeX\ and \MF\ will still
- be error-free inputs to the new systems, and they will still produce the
- same outputs.
-
- However, anybody who dares to use the new extensions will be unable to get
- the desired results from old versions of \TeX\ and \MF\null. I~am therefore
- asking the \TeX\ community to update all copies of the old versions
- as soon as possible. Let us root out and destroy the obsolete 7-bit systems,
- even though we were able to do many fine things with them.
-
- In this note I'll discuss the changes, one by one; then I'll describe
- the exceptions to upward compatibility.
-
- \bigskip
- \noindent
- {\bf 1. The character set.}
- Up to 256 distinct characters are now allowed in input files. The codes that
- were formerly limited to the range $0\ldt 127$ are now in the range
- $0\ldt 255$. All characters are alike; you are free to use any character
- for any purpose in \TeX, assigning appropriate values to its
- {\tt{\char'134}catcode},
- {\tt{\char'134}mathcode},
- {\tt{\char'134}lccode},
- {\tt{\char'134}uccode},
- {\tt{\char'134}sfcode},
- and
- {\tt{\char'134}delcode}.
- Plain \TeX\ initializes these code values for characters above~127 just as
- it initializes the codes for ordinary punctuation characters
- like~`{\tt{\char'041}}'.
-
- There's a new convention for inputting an arbitrary 8-bit character
- to \TeX\ when you can't necessarily type~it: The four consecutive
- characters
- {\tt{\char'136\char'136}}$\alpha\beta$, where $\alpha$ and~$\beta$ are
- any of the ``lowercase hexadecimal digits''
- {\tt{0}},
- {\tt{1}},
- {\tt{2}},
- {\tt{3}},
- {\tt{4}},
- {\tt{5}},
- {\tt{6}},
- {\tt{7}},
- {\tt{8}},
- {\tt{9}},
- {\tt{a}},
- {\tt{b}},
- {\tt{c}},
- {\tt{d}},
- {\tt{e}},
- or
- {\tt{f}},
- are treated by \TeX\ on input as if they were a single character with
- specified code digits. For example,
- {\tt{\char'136\char'136}80}
- gives character code~128; the entire character set
- is available from
- {\tt{\char'136\char'136}00}
- to
- {\tt{\char'136\char'136}ff}.
- The old convention discussed in Appendix~C, under which character~0 was
- {\tt{\char'136\char'136\char'100}},
- character~1 (control--A) was
- {\tt{\char'136\char'136}A},
- \dots,
- and character~127 was
- {\tt{\char'136\char'136}?},
- still works for the first 128~character codes, except that the
- character following
- {\tt{\char'136\char'136}}
- should not be a lowercase hexadecimal digit when the immediately following
- character is another such digit.
-
- The existence of 8-bit characters has less effect
- in \MF\ than in \TeX, because \MF's character classes are built in to each
- installation. The normal set of 95~printing characters described on
- page~51 of
- \MFbook\
- can be supplemented by extended characters as discussed on page~282, but this
- is rarely done because it leads to problems of portability. \MF's
- {\bf char} operator is now redefined to operate modulo~256 instead
- of modulo~128.
-
- \bigskip\noindent
- {\bf 2. Hyphenation tables.}
- Up to 256 distinct sets of rules for hyphenation are now allowed in \TeX.
- There's a new integer parameter called
- {\tt{\char'134}language},
- whose current value specifies the hyphenation convention in force. If
- {\tt{\char'134}language}
- is negative or greater than~255, \TeX\ acts as if
- $\hbox{\tt{\char'134}language}=0$.
-
- When you list hyphenation exceptions with \TeX's
- {\tt{\char'134}hyphenation}
- primitive, those exceptions apply to the current language only. Similarly,
- the
- {\tt{\char'134}patterns}
- primitive tells \TeX\ to remember new hyphenation patterns for the current
- language; this operation is allowed only in the special ``initialization''
- program called {\tt INITEX}\null. Hyphenation exceptions can be added at any
- time, but new patterns cannot be added after a paragraph has been typeset.
-
- When \TeX\ reads the text of a paragraph, it automatically inserts
- ``whatsit nodes'' into the horizontal list for that paragraph whenever
- a character comes from a different
- {\tt{\char'134}language}
- than its predecessor. In that way \TeX\ can tell what hyphenation
- rules to use on each word of the paragraph even if you switch
- frequently back and forth among many different languages.
-
- The special whatsit nodes are inserted automatically in unrestricted horizontal
- mode (i.e., when you are creating a paragraph, but not when you are
- specifying the contents of an hbox). You can insert a special whatsit
- yourself in restricted horizontal mode by saying
- {\tt{\char'134}language}$\langle$number$\rangle$.
- This is needed only if you are doing something tricky, like unboxing some
- contribution to a paragraph.
-
- \bigskip\noindent
- {\bf 3. Hyphenated fragment control.}
- \TeX\ has new parameters
- {\tt{\char'134}lefthyphenmin}
- and
- {\tt{\char'134}righthyphenmin},
- which specify the smallest word fragments that will appear at the beginning
- or end of a word that has been hyphenated. Previously the values
- {\tt{\char'134}lefthyphenmin=2}
- and
- {\tt{\char'134}righthyphenmin=3}
- were hard-wired into \TeX\ and impossible to change. Now plain \TeX\
- format supplies the old values, which are still recommended for most
- American publications; but you can get more hyphens by decreasing these
- parameters, and you can get fewer hyphens by increasing them. If the sum of
- {\tt{\char'134}lefthyphenmin}
- and
- {\tt{\char'134}righthyphenmin}
- is~63 or more, all hyphenation is suppressed. (You can also suppress
- hyphenation by using a font with
- {\tt{\char'134}hyphenchar=-1},
- or by switching to a
- {\tt{\char'134}language}
- that has no hyphenation patterns or exceptions.)
-
- \bigskip\noindent
- {\bf 4. Smarter ligatures.}
- Now here's the most radical change.
- Previous versions of \TeX\ had only one kind of ligature, in which two
- characters like~`f' and~`i' were changed into a single character like~`fi'
- when they appeared consecutively. The new \TeX\ understands much more
- complex constructions by which, for example, we could change
- an~`i' following~`f' to a dotless~`\i' while the~`f' remains
- unchanged:~`f\i'.
-
- As before, you get ligatures only if they have been provided in the font
- you are using. So let's look at the new features of \MF\ by which
- enhanced ligatures can be created. A~\MF\ programmer can specify a
- ``ligature/kerning program'' for any character of the font being
- created. If, for example, the~`fi' combination appears in font
- position~12, the replacement of~`f' and~`\i' by~`fi' is specified by
- including the statement
- $$\hbox{\tt{"i"~=:~12}}$$
- in the ligature/kerning program for {\tt{"f"}}; this is \MF's present
- convention.
-
- The new ligatures allow you to retain one or both of the original characters
- while inserting a new one. Instead of {\tt{=:}} you can also write
- {\tt{\char'174}=:} if you wish to retain the left character, or
- {\tt{=:{\char'174}}} if you wish to retain the right character,
- or {\tt{\char'174}=:{\char'174}} if you want to keep them both.
- For example, if the dotless~\i\ appears in font position~16, you can
- get the behavior mentioned above by having
- $$\hbox{%
- {\tt{"i" {\char'174}=: 16}}
- }$$
- in f's program.
-
- There also are four additional operators
- $$\hbox{%
- {\tt{\char'174}=:{\char'076}},\qquad
- {\tt{=:{\char'174\char'076}}},\qquad
- {\tt{\char'174}=:{\char'174\char'076}},\qquad
- {\tt{\char'174}=:{\char'174\char'076\char'076}},
- }$$
- where each {\tt\char'076} tells \TeX\ to shift its focus one position
- to the right. For example, if~f and~i had been replaced by~f
- and dotless~\i\ as above, \TeX\ would begin again to execute f's
- ligature/kern program, possibly inserting a kern before the dotless~\i,
- or possibly changing the~f to an entirely different character, etc.
- But if the instruction had been
- $$\hbox{%
- {\tt{"i" {\char'174}=:{\char'076} 16}}
- }$$
- instead, \TeX\ would turn immediately to the ligature/kern program for
- characters following character~16 (the dotless \i);
- no further change would be made between~f and~\i\ even if the font
- had something specified there.
-
- \bigskip\noindent
- {\bf 5. Boundary ligatures.}
- Every consecutive string of `characters' read by \TeX\ in horizontal mode
- (after macro expansion) can be called a `word'. (Technically we consider
- a `character' in this definition to be either a character whose
- {\tt{\char'134}catcode}
- is a
- letter or otherchar, or a control sequence that has been
- {\tt{\char'134}let}
- equal to such a character, or a control sequence that has been defined by
- {\tt{\char'134}chardef},
- or the construction
- {\tt{\char'134}char}$\langle$number$\rangle$.)
- The new \TeX\ now imagines that there is an invisible ``left boundary
- character'' just before every such word, and an invisible ``right boundary
- character'' just after it. These boundary characters take effect if the font
- designer has specified ligatures and/or kerning between them and the
- adjacent letters. Thus, the first or last character of a word can
- now be made to change its shape automatically.
-
- A ligature/kern program for the left boundary character is specified within
- \MF\ by using the special label~
- {\tt{\char'174\char'174}:}
- in a {\bf ligtable} command. A~ligature or kern with the right
- boundary character is specified by assigning a value to the new internal
- \MF\ parameter
- {\it boundarychar},
- and by specifying a ligature or kern with respect to this character.
- The
- {\it boundarychar\/}
- may or may not exist as a real character in the font.
-
- For example, suppose we want to change the first letter of a word from~`F'
- to~`ff' if we are doing some olde English. The \MF\ font designer could then
- say
- $$\hbox{ligtable {\tt{\char'174\char'174}: "F" {\char'174}:= 11}}$$
- if character 11 is the `ff'. The same ligtable instruction should
- appear in the programs for characters like~( and~` and~`` and~- that can
- precede strings of letters; then `{\tt Bassington-French}' will
- yield `Bassington-ffrench'.
-
- If the `s' of our font is the pre-19th
- century~s that looks like a mutilated~`f', and if we have a modern~`s'
- in position~128, we can convert the final~s's as Ben Franklin did by
- introducing ligature instructions such as
- $$\vcenter{\halign{{\tt{#}}\hfil$\;$&{\tt{#}}\hfil\cr
- boundarychar :=&255;\cr
- ligtable "s":&255 =:{\char'174} 128,\cr
- &"." =:{\char'174} 128,\cr
- &"," =:{\char'174} 128,\cr
- &")" =:{\char'174} 128,\cr
- &"'" =:{\char'174} 128,\cr}}$$
- and so on. (A true oldstyle font would also have
- ligatures for
- ss and si and sl and ssi and ssl
- and~st; it would be fun to create a Computer Modern Oldstyle.)
-
- The implicit left boundary character is omitted by \TeX\ if you say
- {\tt{\char'134}noboundary}
- just before the word; the implicit right boundary is omitted if you say
- {\tt{\char'134}noboundary}
- just after it.
-
- \bigskip\noindent
- {\bf 6. More compact ligatures.}
- Two or more ligtables can now share common code. To do this in \MF, you
- say `{\bf skipto}~$\langle n\rangle$' at the end of one {\bf ligtable}
- command, then you say `$\langle n\rangle$::' within another. Such local labels
- can be reused; e.g., you can say {\bf skipto}~1 again after {\tt 1::} has
- appeared, and this skips to the {\it next\/} appearance of~{\tt 1::}. There
- are 256~local labels, numbered~0 to~255. Restriction: At most 128 ligature
- or kern commands can intervene between a {\bf skipto} and its matching label.
-
- The {\tt TFM} file format has been upwardly extended to allow more than 32,500
- ligature/kern commands per font. (Previously there was an effective limit
- of 256.)
-
- \bigskip\noindent
- {\bf 7. Better looking sloppiness.}
- There is now a better way to avoid overfull boxes, for people who don't want
- to look at their documents to fix unfeasible line breaks manually. Previously
- people tried to do this by setting
- {\tt{\char'134}tolerance=10000},
- but the result was terrible because \TeX\ would tend to consolidate
- all the badness in one truly horrible line. (\TeX\ considers all badness
- $\ge10000$ to be infinitely bad, and all these infinities are equal.)
-
- The new feature is a dimension parameter called
- {\tt{\char'134}emergencystretch}.
- If
- {\tt{\char'134}emergencystretch}
- is positive and if \TeX\ has been unable to typeset a paragraph without
- exceeding the given tolerances, another pass over the paragraph is made
- in which \TeX\ pretends that additional stretchability equal to
- {\tt{\char'134}emergencystretch}
- is present in every line. The effect of this is to scale down all the
- badnesses into a range where previously infinite cases become finite;
- \TeX\ will find an optimum solution to the scaled-down problem, and this
- will be about as good as possible in a practical sense. (The extra stretching
- is not really present; therefore underfull boxes will be reported in warning
- messges unless
- {\tt{\char'134}hbadness}
- is increased.)
-
- \bigskip\noindent
- {\bf 8. Looking at badness.}
- \TeX\ has a new internal integer parameter called
- {\tt{\char'134}badness}
- that records the badness of the box it has most recently constructed.
- If that box was overfull,
- {\tt{\char'134}badness}
- will be 1000000; otherwise
- {\tt{\char'134}badness}
- will be between~0 and~10000.
-
- \bigskip\noindent
- {\bf 9. Looking at the line number.}
- \TeX\ also has a new internal integer parameter called
- {\tt{\char'134}inputlineno},
- which contains the number of the line that \TeX\ would show on an error message
- if an error occurred now. (This parameter and
- {\tt{\char'134}badness}
- are ``read only'' in the same way as
- {\tt{\char'134}lastpenalty}:
- You can use them in the context of a $\langle$number$\rangle$, e.g., by saying
- `{\tt{\char'134}ifnum{\char'134}inputlineno{\char'076\char'134}badness ...\
- {\char'134}fi}'
- or
- `{\tt{\char'134}the{\char'134}inputlineno}',
- but you cannot set them to new values.)
-
- \bigskip\noindent
- {\bf 10. Not looking at error context.}
- There's a new integer parameter called
- {\tt{\char'134}errorcontextlines}
- that specifies the maximum number of two-line pairs of context displayed with
- \TeX's error messages (in addition to the top and bottom lines, which always
- appear). Plain \TeX\ now sets
- {\tt{\char'134}errorcontextlines=5},
- but higher level format packages might prefer
- {\tt{\char'134}errorcontextlines=1}
- or even
- {\tt{\char'134}errorcontextlines=0}.
- In the latter case, an error that previously involved three or more pairs of
- context would now appear as follows:
-
- \halign{\qquad\qquad{\tt{#}}\hfil\cr
- {\char'041} Error.\cr
- $\langle$somewhere$\rangle$ The {\char'134}top\cr
- \phantom{$\langle$somewhere$\rangle$ The {\char'134}top\ }line\cr
- ...\cr
- 1.123 {\char'134}The\cr
- \phantom{1.123 {\char'134}The\ }bottom line.\cr}
-
- \noindent
- (If
- {\tt{\char'134}errorcontextlines{\char'074}0}
- you wouldn't even see the `{\tt{...}}' here.)
-
- \bigskip\noindent
- {\bf 11. Output recycling.}
- One more new integer parameter completes the set. If
- {\tt{\char'134}holdinginserts{\char'076}0}
- when \TeX\ is putting the current page into
- {\tt{\char'134}box255}
- for the
- {\tt{\char'134}output}
- routine, \TeX\ will not move anything from insertion nodes into the
- corresponding boxes; all insertion nodes will stay in place. Designers of
- output routines can use this when they want to put the contents of box~255 back
- into the current page to be re-broken (because they might want to change
- {\tt{\char'134}vsize}
- or something).
-
- \bigskip\noindent
- {\bf 12. Exceptions to upward compatibility.}
- The new features of \TeX\ and \MF\ imply that a few things work differently
- than before. I~will try to list all such cases here (except when the
- previous behavior was erroneous due to a bug in \TeX\ or \MF\null).
- I~don't know of any cases where users will actually be affected, because
- all of these exceptions are pretty esoteric.
-
- \medskip $\bullet$\enspace
- \TeX\ used to convert the character strings
- {\tt{\char'136\char'136}0},
- {\tt{\char'136\char'136}1},
- \dots,
- {\tt{\char'136\char'136}9},
- {\tt{\char'136\char'136}a},
- {\tt{\char'136\char'136}b},
- {\tt{\char'136\char'136}c},
- {\tt{\char'136\char'136}d},
- {\tt{\char'136\char'136}e},
- {\tt{\char'136\char'136}f}
- into the respective single characters
- {\tt p},
- {\tt q},
- \dots,
- {\tt y},
- {\tt{\char'041}},
- {\tt "},
- {\tt{\char'043}},
- {\tt{\char'044}},
- {\tt{\char'045}},
- {\tt{\char'046}}.
- It will no longer do this if the following character is one of the characters
- {\tt 0123456789abcdef}.
-
- \medskip $\bullet$\enspace
- \TeX\ used to insert no character at the end of an input line if
- {\tt{\char'134}endlinechar{\char'076}127}.
- It will now insert a character unless
- {\tt{\char'134}endlinechar{\char'076}255}.
- (As previously,
- {\tt{\char'134}endlinechar{\char'074}0}
- suppresses the end-of-line character. This character is normally
- $13=$ ASCII control--M $=$ carriage return.)
-
- \medskip $\bullet$\enspace
- Some diagnostic messages from \TeX\ used to have the notation
- {\tt ["80]} \dots {\tt ["FF]}
- when referring to characters $128\ldots 255$ (for example when displaying the
- contents of an overfull box involving fonts that include such characters).
- The notation
- {\tt{\char'136\char'136}80} $\ldots$
- {\tt{\char'136\char'136}ff}
- is now used instead.
-
- \medskip $\bullet$\enspace
- The expressions
- {\tt{char128}} and {\tt{char0}} used to be equivalent in \MF; now
- {\bf char} is defined modulo~256 instead. Hence {\tt{char-1}} $=$
- {\tt{char255}}, etc.
-
- \medskip $\bullet$\enspace
- {\tt INITEX} used to forget all previous hyphenation patterns each time
- you specified
- {\tt{\char'134}patterns}.
- Now all hyphenation pattern specifications are cummulative, and you are not
- permitted to use
- {\tt{\char'134}patterns}
- after a paragraph has been hyphenated by {\tt INITEX}.
-
- \medskip $\bullet$\enspace
- \TeX\ used to act a bit differently when you tried to typeset missing
- characters of a font. A~missing character is now considered to be a word
- boundary, so you will get slightly more diagnostic output when
- {\tt{\char'134}tracingcommands{\char'076}0}.
-
- \medskip $\bullet$\enspace
- \TeX\ and \MF\ will report different statistics at the end of a run because
- they now have a different number of primitives.
-
- \medskip $\bullet$\enspace
- Programs that use the string pool feature of {\tt TANGLE} will no longer run
- without changes, because the new {\tt TANGLE} starts numbering multicharacter
- strings at~256 instead of~128.
-
- \medskip $\bullet$\enspace
- {\tt INITEX} programs must now set
- {\tt{\char'134}lefthyphenmin=2} and
- {\tt{\char'134}righthyphenmin=3}
- in order to reproduce their previous behavior.
-
- \bye
-
-
- ########################################################################
-
- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
- % Character code reference
- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
- %
- % Upper case letters: ABCDEFGHIJKLMNOPQRSTUVWXYZ
- % Lower case letters: abcdefghijklmnopqrstuvwxyz
- % Digits: 0123456789
- % Square, curly, angle braces, parentheses: [] {} <> ()
- % Backslash, slash, vertical bar: \ / |
- % Punctuation: . ? ! , : ;
- % Underscore, hyphen, equals sign: _ - =
- % Quotes--right left double: ' ` "
- %"at", "number" "dollar", "percent", "and": @ # $ % &
- % "hat", "star", "plus", "tilde": ^ * + ~
- %
- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
- [ end of message 019 ]
- -------
-